All coding steps will be done using Python. If you need help on setting up your machine, please refer to this link for help
Before you start, make sure to import and load all the necessary packages:
import pandas as pdimport numpy as npimport statsmodels.api as smfrom matplotlib import pyplot as pltfrom plotnine import*from great_tables import GT, mdfrom mizani.formatters import percent_formatfrom statsmodels.discrete.discrete_model import MNLogitfrom statsmodels.tools.tools import add_constant
Recap on Logit Models
From previous classes, the idea behind using \(\Lambda(X)\) lies on the latent variable approach: think about an unobserved component, \(Y^\star\), which is a continuous variable, such as how much a consumer values a product.
Although we do not observe \(Y^\star\), we do observe consumers’ decisions of buying or not buying the product, depending on a given threshold:
The term \(\frac{p}{1 - p}\) is called odds-ratio, and is simply the ratio of the probability of success over the probability of failure
Hands-On Exercise
Description
You were hired as a consultant to work on a research project that aims to understand how individuals travel around Spain. A survey was applied to more than 100 customers seeking to understand their consumption patterns when it comes to travelling around the country using buses, cars, trains, or airplanes. You have been prompted with the following task:
How customers decide which transportation method to use?
What would have been the effect of decreasing transportation costs in terms of market-shares for each transportation?
You can download the transportation-dataset.csv data using the Download button below.
Note: the groceries dataset has been adapted to fit the purpose of the lecture. You can find the original dataset here
\(\rightarrow\)This example has been taken from (Train 2009)
From Discrete to Multiple Choice
In practice, decisions are, in general, more complicated than a simply binary choice:
Consumers choose between a product given a finite set of choices;
Passengers choose which transportation method is the preferred one for their trips;
Individuals choose which payment method is the best to accommodate their budget constraints.
In more practical cases, the set of choices \(k\) is, in general, greater than \(2\). In this lecture, we will see that we can accommodate the estimation method used for Logit for cases where \(k > 2\).
This method is generally called multinomial logit, and like logit, is just an example of generalized linear models, a broader class of models that generalize the multiple linear regression model
Bridging Binary to Multi-Choice Models (continued)
Recall that, in a binary setting, our probability measure using a logit was:
\[
P(Y) = \frac{\exp(X\beta)}{1 + \exp(X\beta)}
\] which we will call by \(p\)
Let’s write the probabilities of each scenario \(y=1\) and \(y=0\) for a given customer \(i\) as:
We can use the intuition on these probabilities to arrive at a general formulation.
Bridging Binary to Multi-Choice Models (continued)
Suppose that we consider \(y = 0\) (in our previous example, non-churned customers) as the baseline reference. Then, the logit regression for this customer \(i\) is simply the estimation of the log odds-ratio:
In words, we are estimating the log ratio of probabilities, taking into account that the baseline category is \(y = 0\)
Relating this to the churn example, we are modeling the increase in the probability of churning (relative to non-churning), based on observations \(X\).
What if we have more than two categories?
Bridging Binary to Multi-Choice Models (continued)
Suppose now that we have \(k > 2\). For example, instead of thinking about a churned customer, we can observe its actual bank choice: \(\{Santander, KutxaBank, BankInter\}\)
Our set has now three potential choices. We can write the probability of customer \(i\) choosing bank \(k = z\) as:
Let’s say that we want to fit a model to understand customer’s preferences around bank services providers. Formally, we have information on covariates \(X\). Then, we want to fit a model such that we can recover the probability of choosing alternative \(k\) based on \(X\).
For that, fix \(k = 1\) (in our case, Santander) to be the baseline category. Then:
In words, for each bank \(k\) other than \(k = 1\) (Santander), we will have a separate equation that measures the log odds-ratio of preferring bank \(k\) over Santander
Therefore, if the response variable has \(k\) possible categories, there will be \(k - 1\) equations.
About the dataset
We will use a different dataset to understand multi-choice models.
This data comes from Greene (2003) and consists of a survey about the preferred travel transportation method (air, bus, car, or train for a sample of individuals.
We observe the following characteristics:
Characteristics of the choice vehicle cost, waiting time, travel time
Characteristics of the individuals: income, family size
The actual choice made by each individual: \(\{air, bus, train, car\}\)
We want to understand how both offer characteristics as well as individual characteristics drive the decision to use a specific transportation method
Estimating a Multinomial Logit
A Multinomial Logit model is estimated by maximizing the likelihood of observing the decisions for each consumer \(i\)
As before, the estimation of the parameters of this model by maximum likelihood proceeds by maximization of the multinomial likelihood with the probabilities viewed as functions of the parameters
Differently from the binary case, we’ll now have one equation for each \(k \neq 1\). Note that it really makes no difference which category we pick as the reference cell, because we can always convert from one formulation to the other—similar case with defining a dummy variable
Recall that we’ve recovered this probability by fixing a reference category (in this case, air).
Therefore, to recover the probabilities of individuals choosing air:
What happens to the probability of using each mode as the size of the family grows?
In order to analyze that, we will:
Make a copy of the original matrix of covariates, \(X\)
Use a loop iterating different sets of customers, with size ranging from \(1\) to \(4\)
Predict the probabilities for each choice and averaging them out across customers
Append the results
Result: Consumers tend to switch over to cars more aggressively
Counterfactual Exercise 1 - Probabilities
Air
Bus
Car
Train
1
29.10%
20.71%
21.31%
28.88%
2
26.34%
11.12%
30.24%
32.30%
3
23.19%
5.33%
38.63%
32.85%
4
19.58%
2.38%
46.42%
31.62%
5
15.24%
1.03%
54.15%
29.58%
# Define class labels mappingclass_mapping = {0: 'air', 1: 'bus', 2: 'car', 3: 'train'}#Define size bucketssize_buckets = np.arange(1,6,1)#Store the resultsresults=pd.DataFrame()for i in size_buckets:# Create counterfactual DataFrame with current values of X except for and different sizes X_Temp=X.copy() X_Temp['size']=i# Predict probabilities for counterfactual data counterfactual_probs = pd.DataFrame({'estimate': multinomial_result.predict(X_Temp).mean(axis=0)}).T counterfactual_probs.columns=counterfactual_probs.columns.map(class_mapping) counterfactual_probs['size']=i counterfactual_probs=counterfactual_probs.reindex(columns=['size','air','bus','car','train'])#Append results=pd.concat([results,counterfactual_probs],axis=0,ignore_index=True)# Display the contingency table and accuracyTable= ( GT(results) .cols_align('center') .tab_stub(rowname_col='size') .fmt_percent() .cols_label( size='Size', air ='Air', bus ='Bus', car ='Car', train ='Train', ) .tab_header(title=md("**Counterfactual Exercise 1 - Probabilities**")) .opt_stylize(style=1,color='red'))Table.tab_options(table_width="100%",table_font_size="25px")
How many customers would upgrade their travel plans to air if they receive an increase in income?
In order to analyze that, we will:
Make a copy of the original matrix of covariates, \(X\)
Use a loop iterating different sets of customers, with income ranging from \(0\) to \(100\) by increments of \(10\)
Predict the probabilities for each choice and averaging them out across customers
Append the results
Counterfactual Exercise 2 - Probabilities
Air
Bus
Car
Train
0
22.47%
11.01%
9.11%
57.41%
10
24.07%
13.01%
13.67%
49.25%
20
25.46%
14.72%
19.53%
40.29%
30
26.67%
15.82%
26.33%
31.17%
40
27.70%
16.11%
33.44%
22.75%
50
28.56%
15.57%
40.15%
15.71%
60
29.27%
14.38%
45.99%
10.35%
70
29.86%
12.79%
50.78%
6.57%
80
30.35%
11.04%
54.55%
4.06%
90
30.78%
9.31%
57.46%
2.45%
100
31.16%
7.72%
59.66%
1.46%
#Define income bucketsincome_buckets = np.arange(0,110,10)#Store the resultsresults=pd.DataFrame()for i in income_buckets:# Create counterfactual DataFrame with current values of X except for and different sizes X_Temp=X.copy() X_Temp['income']=i# Predict probabilities for counterfactual data counterfactual_probs = pd.DataFrame({'estimate': multinomial_result.predict(X_Temp).mean(axis=0)}).T counterfactual_probs.columns=counterfactual_probs.columns.map(class_mapping) counterfactual_probs['income']=i counterfactual_probs=counterfactual_probs.reindex(columns=['income','air','bus','car','train'])#Append results=pd.concat([results,counterfactual_probs],axis=0,ignore_index=True)# Display the contingency table and accuracyTable= ( GT(results) .cols_align('center') .tab_stub(rowname_col='income') .fmt_percent() .cols_label( income='Income Level', air ='Air', bus ='Bus', car ='Car', train ='Train', ) .tab_header(title=md("**Counterfactual Exercise 2 - Probabilities**")) .opt_stylize(style=1,color='red'))Table.tab_options(table_width="100%",table_font_size="25px")
How many customers would upgrade their travel plans to air if travel costs decrease?
In order to analyze that, we will:
Make a copy of the original matrix of covariates, \(X\)
Create a range of different customers, with travel costs set to \(10\%-90\%\) of the original cost for each customer
Predict the probabilities for each choice and averaging them out across customers
Append the results
Counterfactual Exercise 3 - Probabilities
Air
Bus
Car
Train
0%
99.97%
0.00%
0.02%
0.01%
10%
99.03%
0.03%
0.64%
0.30%
20%
90.16%
0.81%
4.80%
4.23%
30%
69.24%
3.70%
13.76%
13.29%
40%
60.21%
5.23%
17.65%
16.91%
50%
54.74%
6.39%
19.33%
19.54%
60%
48.35%
7.84%
21.23%
22.57%
70%
41.30%
9.57%
23.50%
25.63%
80%
35.30%
11.28%
25.54%
27.88%
90%
30.86%
12.84%
27.05%
29.24%
100%
27.62%
14.29%
28.10%
30.00%
#Define travel bucketstravel_buckets = np.arange(0, 1.1, 0.1)#Store the resultsresults=pd.DataFrame()for i in travel_buckets:# Create counterfactual DataFrame with current values of X except for and different sizes X_Temp=X.copy() X_Temp['travel']=X_Temp['travel']*i# Predict probabilities for counterfactual data counterfactual_probs = pd.DataFrame({'estimate': multinomial_result.predict(X_Temp).mean(axis=0)}).T counterfactual_probs.columns=counterfactual_probs.columns.map(class_mapping) counterfactual_probs['travel']="{:.0%}".format(i) counterfactual_probs=counterfactual_probs.reindex(columns=['travel','air','bus','car','train'])#Append results=pd.concat([results,counterfactual_probs],axis=0,ignore_index=True)# Display the predicted probabilities DataFrameresults=results.reindex(columns=['travel','air','bus','car','train'])# Display the contingency table and accuracyTable= ( GT(results) .cols_align('center') .tab_stub(rowname_col='travel') .fmt_percent() .cols_label( travel='% of Original Travel Cost', air ='Air', bus ='Bus', car ='Car', train ='Train', ) .tab_header(title=md("**Counterfactual Exercise 3 - Probabilities**")) .opt_stylize(style=1,color='red'))Table.tab_options(table_width="100%",table_font_size="25px")
Limitations of the Multinomial Logit
Recall that we estimated our predicted probabilities by looking at:
As a consequence, the probabilities of two choices are the same if they have the same characteristics
This creates room for unreasonable substitution patterns. One of the most important properties of the multinomial logit model is the Independence from Irrelevant Alternatives (IIA)
The IIA property states that for any individual, the ratio of probabilities of choosing two alternatives is independent of the availability or attributes of any other alternatives
Limitations of the Multinomial Logit (continued)
Consider the probability of choosing mode=bus\(k = 1\) and mode=train\(k = 2\). According to the multinomial model, the probabilities of choosing each alternative are:
The IIA property limits responses to changes predicted by the multinomial logit model
A well-known paradox illustrating this is the red-bus/blue-bus paradox:
1.Suppose you have two alternatives with identical properties: car and red bus
If their observable attributes are exactly the same, we should expect the market share of each alternative to be the same (i.e., \(50\%\) each)
Now, assume a blue bus category is introduced with the same observable characteristics. You would expect the new probabilities to be:
Car: \(50\%\)
Red bus: \(25\%\)
Blue bus: \(25\%\)
Problem: due to the IIA, a multinomial logit model would predict that each option has 33%, which is clearly counterintuitive, as it violates rational decision-making - remember, in terms of valued features, red and blue buses are essentially the same!
Limitations of the Multinomial Logit (continued)
To which extent do the implications of the IIA affect our problem? In order to see that, suppose that you have an increase of \(10\%\) in the cost of using bus
If that is the case, individuals who stop using buses due to cost increases are predicted to distribute themselves among the remaining modes in proportion to the initial probabilities of choosing the remaining modes
In other words, the ratio of probabilities across the other choices remain intact
Does this sound reasonable? Likely not! If there’s an increase in bus prices, we would expect air and bus passengers to react differently to these changes:
Since bus and train are closely related, one would expect marginal bus users to move towards train
As a consequence, the amount of marginal users moving to air should be much less
Rethinking our use of multinomial logit
How we can change this? Adding individual preference heterogeneity into the choices:
\[
V_{i,k}= \underbrace{\alpha_k+X\beta_{k}}_{\text{Average utility for choice k}} + \underbrace{\mu_{i,k}}_{\text{Customer-specific utility for k}}
\]
In the red bus/blue bus paradox, this would allow \(50\%\) of the customers to really prefer cars
In our practical example, this would allow air passengers to stick with their choices regardless of the changes in bus fares
Implementations:
Use of nested logit models: first, they choose whether they’ll go with high/low cost; after that, they choose the specific transportation method
BLP estimation - see pyBLP for an implementation using Python
References
Train, Kenneth E. 2009. Discrete Choice Methods with Simulation. 2nd ed. Cambridge, England: Cambridge University Press.